27 research outputs found
Latent Syntactic Structure-Based Sentiment Analysis
People share their opinions about things like products, movies and services using social media channels. The analysis of these textual contents for sentiments is a gold mine for marketing experts, thus automatic sentiment analysis is a popular area of applied artificial intelligence. We propose a latent syntactic structure-based approach for sentiment analysis which requires only sentence-level polarity labels for training. Our experiments on three domains (movie, IT products, restaurant) show that a sentiment analyzer that exploits syntactic parses and has access only to sentence-level polarity annotation for in-domain sentences can outperform state-of-the-art models that were trained on out-domain parse trees with sentiment annotation for each node of the trees. In practice, millions of sentence-level polarity annotations are usually available for a particular domain thus our approach is applicable for training a sentiment analyzer for a new domain while it can exploit the syntactic structure of sentences as well
Extending Multilingual Machine Translation through Imitation Learning
Despite the growing variety of languages supported by existing multilingual
neural machine translation (MNMT) models, most of the world's languages are
still being left behind. We aim to extend large-scale MNMT models to a new
language, allowing for translation between the newly added and all of the
already supported languages in a challenging scenario: using only a parallel
corpus between the new language and English. Previous approaches, such as
continued training on parallel data including the new language, suffer from
catastrophic forgetting (i.e., performance on other languages is reduced). Our
novel approach Imit-MNMT treats the task as an imitation learning process,
which mimicks the behavior of an expert, a technique widely used in the
computer vision area, but not well explored in NLP. More specifically, we
construct a pseudo multi-parallel corpus of the new and the original languages
by pivoting through English, and imitate the output distribution of the
original MNMT model. Extensive experiments show that our approach significantly
improves the translation performance between the new and the original
languages, without severe catastrophic forgetting. We also demonstrate that our
approach is capable of solving copy and off-target problems, which are two
common issues existence in current large-scale MNMT models
DomĂ©nspecifikus polaritáslexikonok automatikus előállĂtása magyar nyelvre
Napjainkban a közössĂ©gi mĂ©dia jelentĹ‘s nĂ©pszerűsĂ©gre tett szert, szinte bármilyen tĂ©makörben nagy mennyisĂ©gű szöveg Ă©rhetĹ‘ el. Ennek köszönhetĹ‘en nagy figyelmet kaptak a kĂĽlönbözĹ‘ vĂ©lemĂ©nydetekciĂłs mĂłdszerek, melyek feladata szövegek osztályozása azok tartalmának polaritása alapján. A feladat megoldása során segĂtsĂ©get nyĂşjtanak az Ăşn. polaritáslexikonok, melyek az egyes szavak polaritására nĂ©zve hordoznak informáciĂłkat. Munkánkban bemutatunk kĂĽlönbözĹ‘ mĂłdszereket lexikonok előállĂtására, valamint azok kiegĂ©szĂtĂ©sĂ©re Ă©s adaptálására más domĂ©nekre. Vizsgálatainkat kifejezetten számĂtástechnikai eszközökkel kapcsolatos vĂ©lemĂ©nyeken Ă©s általános hĂrekbĹ‘l származĂł szövegeken vĂ©geztĂĽk el, melyekbĹ‘l kiderĂĽl, hogy az osztályozás pontosságára nĂ©zve a megfelelĹ‘ lexikon kiválasztása meghatározĂł
Entitásorientált vĂ©lemĂ©nydetekciĂł webes hĂranyagokbĂłl
Napjainkban a hĂrközlĂ©s jelentĹ‘s hányada digitális formában törtĂ©nik, a hĂranyagokban emlĂtett entitásokra vonatkozĂł vĂ©lemĂ©nyek polaritásának automatikus meghatározása pedig komoly elĹ‘nyökkel járhat. Éppen ezĂ©rt munkánk során az OpinHuBank adatbázisban találhatĂł entitásokra vonatkozĂł vĂ©lemĂ©nyek bekategorizálását tűztĂĽk ki feladatunkul. A javasolt megoldásunk többek között a szövegegysĂ©gek dependenciaelemzĂ©sĂ©re is támaszkodva kĂ©pes az entitások mondatbeli szerepĂ©nek figyelembevĂ©telĂ©vel pontosabb kĂ©pet adni a rájuk vonatkozĂł vĂ©lemĂ©nyekrĹ‘l